19 research outputs found

    EPSILON: an eQTL prioritization framework using similarity measures derived from local networks

    Get PDF
    Motivation: When genomic data are associated with gene expression data, the resulting expression quantitative trait loci (eQTL) will likely span multiple genes. eQTL prioritization techniques can be used to select the most likely causal gene affecting the expression of a target gene from a list of candidates. As an input, these techniques use physical interaction networks that often contain highly connected genes and unreliable or irrelevant interactions that can interfere with the prioritization process. We present EPSILON, an extendable framework for eQTL prioritization, which mitigates the effect of highly connected genes and unreliable interactions by constructing a local network before a network-based similarity measure is applied to select the true causal gene. Results: We tested the new method on three eQTL datasets derived from yeast data using three different association techniques. A physical interaction network was constructed, and each eQTL in each dataset was prioritized using the EPSILON approach: first, a local network was constructed using a k-trials shortest path algorithm, followed by the calculation of a network-based similarity measure. Three similarity measures were evaluated: random walks, the Laplacian Exponential Diffusion kernel and the Regularized Commute-Time kernel. The aim was to predict knockout interactions from a yeast knockout compendium. EPSILON outperformed two reference prioritization methods, random assignment and shortest path prioritization. Next, we found that using a local network significantly increased prioritization performance in terms of predicted knockout pairs when compared with using exactly the same network similarity measures on the global network, with an average increase in prioritization performance of 8 percentage points (P < 10(-5))

    Genome-wide detection of predicted non-coding RNAs in Rhizobium etli expressed during free-living and host-associated growth using a high-resolution tiling array

    Get PDF
    Non-coding RNAs (ncRNAs) play a crucial role in the intricate regulation of bacterial gene expression, allowing bacteria to quickly adapt to changing environments. In the past few years, a growing number of regulatory RNA elements have been predicted by computational methods, mostly in well-studied gamma-proteobacteria but lately in several alpha-proteobacteria as well. Here, we have compared an extensive compilation of these non-coding RNA predictions to intergenic expression data of a whole-genome high-resolution tiling array in the soil-dwelling alpha-proteobacterium Rhizobium etli.Journal ArticleResearch Support, Non-U.S. Gov'tinfo:eu-repo/semantics/publishe

    Stress response regulators identified through genome-wide transcriptome analysis of the (p)ppGpp-dependent response in Rhizobium etli

    Get PDF
    Background: The alarmone (p) ppGpp mediates a global reprogramming of gene expression upon nutrient limitation and other stresses to cope with these unfavorable conditions. Synthesis of (p) ppGpp is, in most bacteria, controlled by RelA/SpoT (Rsh) proteins. The role of (p) ppGpp has been characterized primarily in Escherichia coli and several Gram-positive bacteria. Here, we report the first in-depth analysis of the (p) ppGpp-regulon in an alpha-proteobacterium using a high-resolution tiling array to better understand the pleiotropic stress phenotype of a relA/rsh mutant. Results: We compared gene expression of the Rhizobium etli wild type and rsh (previously rel) mutant during exponential and stationary phase, identifying numerous (p) ppGpp targets, including small non-coding RNAs. The majority of the 834 (p) ppGpp-dependent genes were detected during stationary phase. Unexpectedly, 223 genes were expressed (p) ppGpp-dependently during early exponential phase, indicating the hitherto unrecognized importance of (p) ppGpp during active growth. Furthermore, we identified two (p) ppGpp-dependent key regulators for survival during heat and oxidative stress and one regulator putatively involved in metabolic adaptation, namely extracytoplasmic function sigma factor EcfG2/PF00052, transcription factor CH00371, and serine protein kinase PrkA. Conclusions: The regulatory role of (p) ppGpp in R. etli stress adaptation is far-reaching in redirecting gene expression during all growth phases. Genome-wide transcriptome analysis of a strain deficient in a global regulator, and exhibiting a pleiotropic phenotype, enables the identification of more specific regulators that control genes associated with a subset of stress phenotypes. This work is an important step toward a full understanding of the regulatory network underlying stress responses in alpha-proteobacteria

    Query-based biclustering of gene expression data using Probabilistic Relational Models

    Get PDF
    Background: With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set.status: publishe

    Network-based modeling of omics data

    No full text
    Understanding the cellular behavior from a systems perspective requires the identification of functional and physical interactions among diverse molecular entities in a cell (i.e., DNA/RNA, proteins, metabolites,...). A network-based representation captures many of the essential characteristics of these various biological systems and can be exploited to study a molecular entity like a protein in a wider context than just in isolation and can provide valuable insights into the system s mode of action and functionalities. Unraveling these molecular networks is considered one of the foremost challenges in current bioinformatics research.Powerful and scalable technologies enabled the generation of genome-wide datasets that describe cellular systems by capturing the interactions of their building blocks or by characterizing the state of a system under different environmental stimuli. The distinct nature of these datasets often brings about complementary views on cellular behavior and integration of them holds the key to the successful reconstruction of the underlying networks. In a first part of the thesis we therefore aim to infer networks from diverse of omics datasets. We present ProBic, a method to identify modules of genes that show co-expression to a set of genes of interest (i.e., query or seed genes), and the conditions in which they are co-expressed. These modules are termed biclusters, and represent potentially co-regulated genes, highlighting part of the transcriptional network. We applied ProBic on a benchmark set in E. coli and showed that high quality biclusters with biological relevance could be obtained.Next, we integrated information from several functional datasets to predict protein-protein interactions from public experimental interaction data, using a naive Bayesian classifier. Clustering the obtained protein network illustrated the presence of functionally coherent modules, and showed the opportunity of assigning novel gene functions based on cluster functionality. Viewing a single entity or an experimental dataset in the light of an interaction network can reveal previous unknown insights in biological processes or functional behavior. Methodologies that identify and explore paths in networks between given input and output nodes have gained much interest. Such a path in a network can be seen as a mechanistic representation of the way information propagates through the network. Identifying biologically meaningful paths in the network between nodes of interest, nodes which can be defined from functional datasets that are independent from the network itself, can unveil previously uncovered signal flow mechanisms that are responsible for the observed functional behavior or define a measure for relatedness of two nodes in the network.In a second part of the thesis we present a novel network-based method to interpret eQTL data. One of the challenges in eQTL analysis, besides the identification of genetic loci that are associated to gene expression, is the exact identification of the true causal gene in an associated locus, causing the variation in gene expression. The method therefore aims to prioritize candidate genes in a locus based on their network-relatedness to a set of associated target genes. We show that the approach outperforms a state-of-the-art gene prioritization method in case a common causal factor is present for a set of targets, based on several performance criteria.Finally, we applied the methodology on a biological dataset in yeast that combined both genomic and expression variation of a pool of yeast segregants. We predicted several genes to be involved in ethanol production capacity, which is an important phenotype in the fermentation industry.nrpages: 173status: publishe

    Network-based functional modeling of genomics, transcriptomics and metabolism in bacteria

    No full text
    Molecular entities present in a cell (mRNA, proteins, metabolites, ...) do not act in isolation, but rather in cooperation with each other to define an organisms form and function. Their concerted action can be viewed as networks of interacting entities that are active under certain conditions within the cell or upon certain environmental signals. A main challenge in systems biology is to model these networks, or in other words studying which entities interact to form cellular systems or accomplish similar functions. On the contrary, viewing a single entity or an experimental dataset in the light of an interaction network can reveal previous unknown insights in biological processes. In this review we give an overview of how integrated networks can be reconstructed from multiple omics data and how they can subsequently be used for network-based modeling of cellular function in bacteria

    Path finding in biological networks

    No full text

    Omics derived networks in bacteria

    No full text
    Understanding the cellular behavior from a systems perspective requires the identification of functional and physical interactions among diverse molecular entities in a cell (i.e. DNA/RNA, proteins and metabolites). Powerful and scalable technologies enabled the generation of genome-wide datasets that describe cellular systems by capturing the interactions of their building blocks under different environmental stimuli. The most straightforward way to represent such datasets is by means of molecular networks of which nodes correspond to molecular entities and edges to the interactions amongst those entities. In this review we give an overview of the different functional and physical interaction networks in bacteria that have been or potentially can be built by the integration of diverse omics datasets

    EPSILON: an eQTL prioritization framework using similarity measures derived from local networks

    No full text
    Motivation: When genomic data are associated with gene expression data, the resulting expression quantitative trait loci (eQTL) will likely span multiple genes. eQTL prioritization techniques can be used to select the most likely causal gene affecting the expression of a target gene from a list of candidates. As an input, these techniques use physical interaction networks that often contain highly connected genes and unreliable or irrelevant interactions that can interfere with the prioritization process. We present EPSILON, an extendable framework for eQTL prioritization, which mitigates the effect of highly connected genes and unreliable interactions by constructing a local network before a network-based similarity measure is applied to select the true causal gene. Results: We tested the new method on three eQTL datasets derived from yeast data using three different association techniques. A physical interaction network was constructed, and each eQTL in each dataset was prioritized using the EPSILON approach: first, a local network was constructed using a k-trials shortest path algorithm, followed by the calculation of a network-based similarity measure. Three similarity measures were evaluated: random walks, the Laplacian Exponential Diffusion kernel and the Regularized Commute-Time kernel. The aim was to predict knockout interactions from a yeast knockout compendium. EPSILON outperformed two reference prioritization methods, random assignment and shortest path prioritization. Next, we found that using a local network significantly increased prioritization performance in terms of predicted knockout pairs when compared with using exactly the same network similarity measures on the global network, with an average increase in prioritization performance of 8 percentage points (P < 10(-5))

    PheNetic: network-based interpretation of unstructured gene lists in E. coli

    No full text
    At the present time, omics experiments are commonly used in wet lab practice to identify leads involved in interesting phenotypes. These omics experiments often result in unstructured gene lists, the interpretation of which in terms of pathways or the mode of action is challenging. To aid in the interpretation of such gene lists, we developed PheNetic, a decision theoretic method that exploits publicly available information, captured in a comprehensive interaction network to obtain a mechanistic view of the listed genes. PheNetic selects from an interaction network the sub-networks highlighted by these gene lists. We applied PheNetic to an Escherichia coli interaction network to reanalyse a previously published KO compendium, assessing gene expression of 27 E. coli knock-out mutants under mild acidic conditions. Being able to unveil previously described mechanisms involved in acid resistance demonstrated both the performance of our method and the added value of our integrated E. coli network
    corecore